1,199 research outputs found

    Performance Analysis of Output Threshold-Based Incremental Multiple-Relay Combining Scheme with Adaptive Modulation for Cooperative Networks

    Get PDF
    In this paper, we propose an output threshold-based incremental multiple-relay combining scheme for cooperative amplify-and-forward relay networks with nonidentically distributed relay channels. Specifically, in order to achieve the required performance, we consider both conventional incremental relaying and multiple-relay selection where relays are adaptively selected based on a predetermined output threshold. Moreover, the adaptive modulation technique is adopted by our proposed scheme for satisfying both the spectral efficiency and the required error rate. For the proposed scheme, we first derive an upper bound of the output combined signal-to-noise ratio and then provide its statistics such as cumulative distribution function (CDF), probability density function (PDF), and moment generating function (MGF) over independent, nonidentically distributed Rayleigh fading channels. Additionally, we analyze the system performance in terms of average spectral efficiency, average bit error rate, outage probability, and system complexity. Finally, numerical examples show that our proposed scheme leads to a certain performance improvement in the cooperative networks

    Efficient Video Representation Learning via Masked Video Modeling with Motion-centric Token Selection

    Full text link
    Self-supervised Video Representation Learning (VRL) aims to learn transferrable representations from uncurated, unlabeled video streams that could be utilized for diverse downstream tasks. With recent advances in Masked Image Modeling (MIM), in which the model learns to predict randomly masked regions in the images given only the visible patches, MIM-based VRL methods have emerged and demonstrated their potential by significantly outperforming previous VRL methods. However, they require an excessive amount of computations due to the added temporal dimension. This is because existing MIM-based VRL methods overlook spatial and temporal inequality of information density among the patches in arriving videos by resorting to random masking strategies, thereby wasting computations on predicting uninformative tokens/frames. To tackle these limitations of Masked Video Modeling, we propose a new token selection method that masks our more important tokens according to the object's motions in an online manner, which we refer to as Motion-centric Token Selection. Further, we present a dynamic frame selection strategy that allows the model to focus on informative and causal frames with minimal redundancy. We validate our method over multiple benchmark and Ego4D datasets, showing that the pre-trained model using our proposed method significantly outperforms state-of-the-art VRL methods on downstream tasks, such as action recognition and object state change classification while largely reducing memory requirements during pre-training and fine-tuning.Comment: 15 page

    Exploring Chemical Space with Score-based Out-of-distribution Generation

    Full text link
    A well-known limitation of existing molecular generative models is that the generated molecules highly resemble those in the training set. To generate truly novel molecules that may have even better properties for de novo drug discovery, more powerful exploration in the chemical space is necessary. To this end, we propose Molecular Out-Of-distribution Diffusion(MOOD), a score-based diffusion scheme that incorporates out-of-distribution (OOD) control in the generative stochastic differential equation (SDE) with simple control of a hyperparameter, thus requires no additional costs. Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor that guides the reverse-time diffusion process to high-scoring regions according to target properties such as protein-ligand interactions, drug-likeness, and synthesizability. This allows MOOD to search for novel and meaningful molecules rather than generating unseen yet trivial ones. We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool. Our code is available at https://github.com/SeulLee05/MOOD.Comment: ICML 202

    Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models

    Full text link
    There has been a significant progress in Text-To-Speech (TTS) synthesis technology in recent years, thanks to the advancement in neural generative modeling. However, existing methods on any-speaker adaptive TTS have achieved unsatisfactory performance, due to their suboptimal accuracy in mimicking the target speakers' styles. In this work, we present Grad-StyleSpeech, which is an any-speaker adaptive TTS framework that is based on a diffusion model that can generate highly natural speech with extremely high similarity to target speakers' voice, given a few seconds of reference speech. Grad-StyleSpeech significantly outperforms recent speaker-adaptive TTS baselines on English benchmarks. Audio samples are available at https://nardien.github.io/grad-stylespeech-demo.Comment: ICASSP 202
    • …
    corecore